28 research outputs found

    ChatGPT may Pass the Bar Exam soon, but has a Long Way to Go for the LexGLUE benchmark

    Full text link
    Following the hype around OpenAI's ChatGPT conversational agent, the last straw in the recent development of Large Language Models (LLMs) that demonstrate emergent unprecedented zero-shot capabilities, we audit the latest OpenAI's GPT-3.5 model, `gpt-3.5-turbo', the first available ChatGPT model, in the LexGLUE benchmark in a zero-shot fashion providing examples in a templated instruction-following format. The results indicate that ChatGPT achieves an average micro-F1 score of 47.6% across LexGLUE tasks, surpassing the baseline guessing rates. Notably, the model performs exceptionally well in some datasets, achieving micro-F1 scores of 62.8% and 70.2% in the ECtHR B and LEDGAR datasets, respectively. The code base and model predictions are available for review on https://github.com/coastalcph/zeroshot_lexglue.Comment: Working pape

    An empirical study on cross-x transfer for legal judgment prediction

    Get PDF
    Cross-lingual transfer learning has proven useful in a variety of Natural Language Processing (NLP) tasks, but it is understudied in the context of legal NLP, and not at all in Legal Judgment Prediction (LJP). We explore transfer learning techniques on LJP using the trilingual Swiss-Judgment-Prediction dataset, including cases written in three languages. We find that cross-lingual transfer improves the overall results across languages, especially when we use adapter-based fine-tuning. Finally, we further improve the model's performance by augmenting the training dataset with machine-translated versions of the original documents, using a 3x larger training corpus. Further on, we perform an analysis exploring the effect of cross-domain and cross-regional transfer, i.e., train a model across domains (legal areas), or regions. We find that in both settings (legal areas, origin regions), models trained across all groups perform overall better, while they also have improved results in the worst-case scenarios. Finally, we report improved results when we ambitiously apply cross-jurisdiction transfer, where we further augment our dataset with Indian legal cases

    On the Interplay between Fairness and Explainability

    Full text link
    In order to build reliable and trustworthy NLP applications, models need to be both fair across different demographics and explainable. Usually these two objectives, fairness and explainability, are optimized and/or examined independently of each other. Instead, we argue that forthcoming, trustworthy NLP systems should consider both. In this work, we perform a first study to understand how they influence each other: do fair(er) models rely on more plausible rationales? and vice versa. To this end, we conduct experiments on two English multi-class text classification datasets, BIOS and ECtHR, that provide information on gender and nationality, respectively, as well as human-annotated rationales. We fine-tune pre-trained language models with several methods for (i) bias mitigation, which aims to improve fairness; (ii) rationale extraction, which aims to produce plausible explanations. We find that bias mitigation algorithms do not always lead to fairer models. Moreover, we discover that empirical fairness and explainability are orthogonal.Comment: 15 pages (incl Appendix), 4 figures, 8 table

    Swiss-Judgment-Prediction: A Multilingual Legal Judgment Prediction Benchmark

    Get PDF
    In many jurisdictions, the excessive workload of courts leads to high delays. Suitable predictive AI models can assist legal professionals in their work, and thus enhance and speed up the process. So far, Legal Judgment Prediction (LJP) datasets have been released in English, French, and Chinese. We publicly release a multilingual (German, French, and Italian), diachronic (2000-2020) corpus of 85K cases from the Federal Supreme Court of Switzerland (FSCS). We evaluate state-of-the-art BERT-based methods including two variants of BERT that overcome the BERT input (text) length limitation (up to 512 tokens). Hierarchical BERT has the best performance (approx. 68-70% Macro-F1-Score in German and French). Furthermore, we study how several factors (canton of origin, year of publication, text length, legal area) affect performance. We release both the benchmark dataset and our code to accelerate future research and ensure reproducibility

    Recognizing the Structure and Elements of Contracts Using Word Embeddings

    Get PDF
    Τα συμβόλαια διέπουν τις εταιρικές σχέσεις σε όλο τον κόσμο. Βάση αυτής της αναπτυσσόμενης αγοράς αποτελούν οι επαγγελματίες που διαχειρίζονται συμβόλαια καθημερινά στα πλάισια μιας ευρείας γκάμας εργασιών προσδοκώντας τόσο τους εταιρικούς στόχους όσο και την απαραίτητη νομική συμμόρφωση. Μέσω του πεδίου της Επεξεργασίας Φυσικής Γλώσσας μπορούμε να προσφέρουμε σπουδαίες λύσεις σε αυτό το τομέα, αναδιοργανώνοντας το απλό κειμένο των συμβολαίων σε πολύτιμα δομημένα δεδομένα. Ο σκοπός της παρούσας εργασίας είναι η διερεύνηση και πρόταση μια βασικής προσέγγισης για την αναγνώριση - εξαγωγή της δομής και των βασικών στοιχείων των συμβολαίων. Για το σκοπό αυτό βασιζόμαστε σε μερικές από τις πλέον προηγμένες τεχνικές γλωσσικής μοντελοποίησης όπως οι Διανυσματικές Παραστάσεις Λέξεων. Υποθέτουμε ότι η χρήση των Διανυσματικών Παραστάσεων Λέξεων θα δώσει επιπλέον αξιοπιστία σε τέτοιους είδους σύστηματα σε αντίθεση με προσεγγίσεις που κάνουν χρήση "χειροποίητα" γνωρισμάτων ή/και συστήματα βασισμένα σε κανόνες. Είναι επίσης βασικός στόχος μας, το συγκεκριμένο σύστημα να δύναται να λειτουργήσει σε πραγματικές εφαρμογές. Για αυτό σκοπό αξιολογούμε την χρήση εκτενούς μετα-επεξεργασίας και "ενορχήστρωσης" των επιμέρους τμημάτων, που προκύπτουν μέσα από μια διαδικασία "διαίρει και βασίλευε". Αυτή η προοπτική δίνει υψηλές προσδοκίες τόσο για την εξοικονόμηση πόρων (λ.χ. κόστος, χρόνος, ανθρώπινη προσπάθεια) όσο και για την ποιότητα των υπηρεσίων μετατρέποντας πολύπλοκες εργασίες, που χρίζουν επαναλαμβανόμενης ανθρώπινης διαχείρισης σε αξιόπιστες αυτοματοποιημένες διεργασίες.Contracts govern business relationships around the world. There is a growing market of people processing contracts every day during a wide range of tasks chasing both business goals and the necessary legal compliance. Through Natural Language Processing (NLP), we can offer solutions by reengineering contracts' plain text as valuable structured data. The Objective of this thesis is to research and propose a baseline approach for the recognition-extraction of contracts’ structure and basic elements. For this purpose we rely on some state of the art language modeling techniques such as word embeddings. We presume that the use of word embeddings will give extra reliability against both hand-crafted feature learning and rule-based approaches. One of our main intentions is also that our system (model) will be capable to operate in real-case scenarios, so we evaluate the need for extensive post-processing and orchestration of the discrete components, which are implemented through a divide-and-conquer fashion. This perspective brings a high promise for both savings (i.e. cost, time, human effort) and quality of service by moving complicated tasks which need reiterative human assistance into reliable automated processes

    Rather a Nurse than a Physician -- Contrastive Explanations under Investigation

    Full text link
    Contrastive explanations, where one decision is explained in contrast to another, are supposed to be closer to how humans explain a decision than non-contrastive explanations, where the decision is not necessarily referenced to an alternative. This claim has never been empirically validated. We analyze four English text-classification datasets (SST2, DynaSent, BIOS and DBpedia-Animals). We fine-tune and extract explanations from three different models (RoBERTa, GTP-2, and T5), each in three different sizes and apply three post-hoc explainability methods (LRP, GradientxInput, GradNorm). We furthermore collect and release human rationale annotations for a subset of 100 samples from the BIOS dataset for contrastive and non-contrastive settings. A cross-comparison between model-based rationales and human annotations, both in contrastive and non-contrastive settings, yields a high agreement between the two settings for models as well as for humans. Moreover, model-based explanations computed in both settings align equally well with human rationales. Thus, we empirically find that humans do not necessarily explain in a contrastive manner.9 pages, long paper at ACL 2022 proceedings.Comment: 9 pages, long paper at EMNLP 2023 proceeding
    corecore